Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote

نویسندگان

  • Noëmi Aepli
  • Ruprecht von Waldenfels
  • Tanja Samardzic
چکیده

In this paper, we present an approach to developing resources for a low-resource language, taking advantage of the fact that it is closely related to languages with more resources. In particular, we test our approach on Macedonian, which lacks tools for natural language processing as well as data in order to build such tools. We improve the Macedonian training set for supervised part-ofspeech tagging by transferring available manual annotations from a number of similar languages. Our approach is based on multilingual parallel corpora, automatic word alignment, and a set of rules (majority vote). The performance of a tagger trained on the improved data set of 88% accuracy is significantly better than the baseline of 76%. It can serve as a stepping stone for further improvement of resources for Macedonian. The proposed approach is entirely automatic and it can be easily adapted to other language in similar circumstances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

معرفی رویکردی ماشینی با استفاده از الگوریتم لسک و برچسبدهی نحوی جهت رفع ابهام از معنای کلمات

The present study introduces a machine-based approach for word sense disambiguation (WSD). In Persian, a morphologically complex language, POS tag which lots of homographs are made, one way for doing WSD is allocating the right Part Of Speech (POS) tags to words prior to WSD. Since the frequency of noun and adjective homographs in different Persian POS tag text corpuses is high, POS tag disambi...

متن کامل

Feature-Rich Part-Of-Speech Tagging Using Deep Syntactic and Semantic Analysis

This paper describes the implementation, improvement and evaluation of the machine translation (MT) system proposed by Jackov (2014) when used as a feature-rich part-ofspeech (POS) tagger for Bulgarian. The system does not rely on POS tagging for morphological disambiguation. Instead, all ambiguities are considered in parsing hypotheses that are scored and the best one is used for tagging. The ...

متن کامل

Tagging English by Path Voting Constraints

We describe a constraint-based tagging approach where individual constraint rules vote on sequences of matching tokens and tags. Disambiguation of all tokens in a sentence is performed at the very end by selecting tags that appear on the path that receives the highest vote. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence r...

متن کامل

Tagging French Without Lexical Probabilities - Combining Linguistic Knowledge And Statistical Learning

This paper explores morpho-syntactic ambiguities for French to develop a strategy for part-of-speech disambiguation that a) reflects the complexity of French as an inflected language, b) optimizes the estimation of probabilities, c) allows the user flexibility in choosing a tagset. The problem in extracting lexical probabilities from a limited training corpus is that the statistical model may n...

متن کامل

Use of Weighted Finite State Transducers inPart of Speech

This paper addresses issues in part of speech disambiguation using nite-state transducers and presents two main contributions to the eld. One of them is the use of nite-state machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted nite-state transducers. Another contribution is the successful combination of techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014